Separating Speech From Noise Challenge
ثبت نشده
چکیده
We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins of the audio as 'reliable' or 'unreliable'. This noise mask could be used by another block in the signal processing pipeline to treat the unreliable data as missing and then replace the missing data with an estimate of the clean audio by searching a corpus of clean audio samples for the most probable match using the features of the unreliable data and the surrounding audio. For this project, we have focused on the noise mask estimation using an SVM and not on the data imputation portion of the problem. It has been demonstrated by Kallasjoki et al. [1] that given a good noise mask, it is possible to achieve significant improvements in automated speech recognition accuracy rates by replacing unreliable portions of the audio with estimates of the clean audio. In order to judge the SVM classification accuracy in generating a noise mask, we needed an 'oracle mask' which gave the correct answer for the mask. The oracle mask generates a label of reliable or unreliable for time frame/frequency pairs, using the Mel filterbank energies of the clean signal and the noisy signal and labeling time/frequency pairs as unreliable if the SNR is less than-3 dB. Using the oracle mask, we estimated the best-case performance of noise mask estimation and data imputation by replacing unreliable time/frequency segments (as labeled by the oracle mask) with the known clean speech audio. As expected, this achieves very good recognition rates that approach the accuracy of the using the clean speech audio. As previously discussed, previous work [1] has demonstrated significant performance improvements using an oracle mask along with sparse imputation methods. Estimating the noise mask was a weak point in the paper cited above. Our goal has been to improve on the noise mask generation using machine learning methods. Since the automatic speech recognition system used in the CHiME project (HTK) uses the Mel filterbank energies as features for a hidden Markov model, we started by using these same filterbank energies as features for the support vector machine to estimate a noise mask. We have used the freely available tools LIBLINEAR [2] and LIBSVM [3] for training and prediction rather than writing an SVM from scratch, allowing us to focus on feature selection, kernel selection, …
منابع مشابه
A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملSeparating Speech from Speech Noise to Improve Intelligibility: International Exchange Supplement
متن کامل
CS 229 = = Final Project Report SPEECH & NOISE SEPARATION
In this course project I investigated machine learning approaches on separating speech signals from background noise. Keywords—MFCC, SVM, noise separation, source separation, spectrogram
متن کاملGreater benefit for familiar talkers under cognitive load
Earlier work has demonstrated that cognitive resources are expended when processing the speech of an unfamiliar talker. As such, processing the speech of a familiar talker is a more efficient, automated process. Similarly, data have shown that separating a speech signal from noise also uses cognitive resources and listeners with larger working memory capacities are better able to perceive speec...
متن کاملEnvelope-based inter-aural time difference localization training to improve speech-in-noise perception in the elderly
Background: Many elderly individuals complain of difficulty in understanding speech in noise despite having normal hearing thresholds. According to previous studies, auditory training leads to improvement in speech-in-noise perception, but these studies did not consider the etiology, so their results cannot be generalized. The present study aimed at investigating the effectiveness of envelope-b...
متن کاملAccurate estimation of sinusoidal parameters in an harmonic+noise model for speech synthesis
We present here an Harmonic+Noise Model (HNM) for speech synthesis. The noise part is represented by an autoregressive model whose output is pitchsynchronously modulated in energy. The harmonic part of the signal is represented by a sinusoidal model. This paper compares di erent methods for separating these two components. We then propose a method for the estimation of the sinusoidal parameters...
متن کامل